
Final Project
Group 1:
Hiba Awan
Nathania Stephens
Abstract
Introduction & Background
Motivation/ Purpose
In 2023, there were over 30,000 arrests and close to 65,000 citations in Fairfax County. The Fairfax County boundaries, include areas such as Centreville, Chantilly, Herndon, Reston, Tysons Corner, McLean, Merrifield, George Mason, Annadale, Burke, Springfield, Alexandria, Lorton to name a few. If you live, work, or study in these areas then this project should be of interest to you. This project aims to inform Fairfax County patrons of crime information and hopefully provide some statistical insights that could be applicable.
Goals/ Objectives
In order to provide relevant and insightful crime information, several different visualization methods were applied to help easily interpret and compare data. Statistical learning techniques were utilized to help understand statistic significantly factors and associations between variables. Since the data utilized for this project is largely categorical the project focuses on techniques such as Chi-Squared Test, Logistic Regression, Decision Trees and Random Forest.
Data
Overview
About the Data
Three datasets were pulled from the Fairfax County Police Department website. They covered arrest, citations, and warnings in the year 2023. For simplicity general definitions are provided:
Arrest - When a person is taken into custody to answer for an offense or when there is a deprivation or restraint of a person’s liberty in any significant way.
Citation - Formal notice issued by law enforcement officer for a violation of law, typically related to traffic laws or other minor offenses. Typically requiring a violator to appear in court or pay a fine.
Warning - When a violation, typically minor, has been made but an officer issues a warning rather than a citation.
The data sets included between 24 and 34 variables, but some of many of the variables were redundant or were not applicable to the research (e.g. web_address, phone_number, name). The following attributes were key to the research conducted:
| Column Name | Data Type | Description |
|---|---|---|
| Date | Date | Date of Violation |
| Time | Chr | Time of Violation |
| Offense | Chr | Description of Violation |
| Gender | Chr | Gender of Violator |
| Ethnicity | Chr | Hispanic or Non-Hispanic |
| District | Chr | Administrative area |
| Latitude | Dbl | Coordinates measuring north/ south of equator |
| Longitude | Dbl | Coordinates measuring east/ west of prime meridian |
| Outcome | Chr | Result of violation, arrest, citation, or warning |
Limitations and Assumptions
Due to the nature of the data available on the Fairfax County Police Department website, analysis was limited to qualitative techniques. The approach taken for the project focused on predicting through qualitative responses or classification. This means that each record pulled from the Fairfax County Police Department (FCPD) would be assigned to a category or class.
While understanding local crime is the goal of this project, the data acquired only accounts for crime that was recorded by FCPD. It does not take into account crimes that were not report or any other crime that was not reported through FCPD channels.
Cleaning and Transformation
To address questions related to gender, the data needed to be standardized and correctly categorized. Column names needed to be consistent across the three datasets to merge. Gender was used over Sex. Next the column data would be transformed to consistent labels, e.g. Male, Female, and Other/Unknown. Total proportion for Gender was examined, to verify that other/ unknown class could be removed without…
Research Questions
Is there an association between gender and warnings?
Are there other factors that determine if someone gets out of a “ticket”? OR Are you more likely to get a ticket at the end of the month (some believe that police officers have a monthly quota)
Research & Analysis
Question 1: Is there an association between gender and warnings?
To address this question the null and alternative hypothesis are established.
Null Hypothesis: There is no link between gender and violation outcome (warning or citation).
Alternative Hypothesis: There is a link between gender and violation outcome.
According to the cleaned and combined dataset for warnings and citation, there was a total of 88,320 records. By looking at the counts for each outcome (citation or warning), there are a lot more citations than there are warnings given out by FCPD. This stacked bar chart also shows that males have a higher count for both categories. i
Next, the warning rate for gender is calculated. This looks at the probability of a male or female violator receiving a Warning instead of a citation e.g. getting out of a ticket. To calculate warning rate, the number of warnings are divided by the total number of incidents. \[ \frac{\text{Number of Warnings}}{\text{Total Incidents (Warnings + Citations)}} \]
This shows a slight difference in proportion between the two genders, with females having a higher warning rate than males. Is this difference significant or is it a result of chance or other factors? To help understand these results, the Chi-Squared Test is used.

\[ \chi^2 = \sum \frac{(O-E)^2}{E} \] To implement the Chi-Square Test, a comparison of expected and observed counts are calculate.
| Expected Citations | Observed Citations | Expected Warnings | Observed Warnings | |
|---|---|---|---|---|
| Male | 42,891 | 43,657 | 16,173 | 15,408 |
| Female | 21,243 | 20,478 | 8,011 | 8,777 |
BinaryOutcome
Gender 0 1
Female 20478 8777
Male 43657 15408
Pearson's Chi-squared test with Yates' continuity correction
data: contingency_tbl
X-squared = 150.62, df = 1, p-value < 2.2e-16
BinaryOutcome
Gender 0 1
Female 21243.99 8011.007
Male 42891.01 16173.993
Another Questions….
To address each of these question, first exploratory analysis should be done to gain an understanding and summary of the crime metrics for Fairfax County. This includes understanding what type of crimes occurred the most and where.
General crime Mapping the arrest data for a geospatial visual of where arrest occur.
Next we look at the Top 10 Arrest Type by Incident Based Reporting (IBR) codes.

Next examining the Top 10 Citations
Warning Versus Citation Next an examination of warning versus citation will be observed… This will help understand what different factors could play into getting a warning or a citation.